Goto

Collaborating Authors

 columbia university



Mechanics of Learned Reasoning 1: TempoBench, A Benchmark for Interpretable Deconstruction of Reasoning System Performance

Holzer, Nikolaus, Fishell, William, Ray, Baishakhi, Santolucito, Mark

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly excelling and outpacing human performance on many tasks. However, to improve LLM reasoning, researchers either rely on ad-hoc generated datasets or formal mathematical proof systems such as the Lean proof assistant. Whilst ad-hoc generated methods can capture the decision chains of real-world reasoning processes, they may encode some inadvertent bias in the space of reasoning they cover; they also cannot be formally verified. On the other hand, systems like Lean can guarantee verifiability, but are not well-suited to capture the nature of agentic decision chain-based tasks. This creates a gap both in performance for functions such as business agents or code assistants, and in the usefulness of LLM reasoning benchmarks, whereby these fall short in reasoning structure or real-world alignment. We introduce TempoBench, the first formally grounded and verifiable diagnostic benchmark that parametrizes difficulty to systematically analyze how LLMs perform reasoning. TempoBench uses two evaluation benchmarks to break down reasoning ability. First, temporal trace evaluation (TTE) tests the ability of an LLM to understand and simulate the execution of a given multi-step reasoning system. Subsequently, temporal causal evaluation (TCE) tests an LLM's ability to perform multi-step causal reasoning and to distill cause-and-effect relations from complex systems. We find that models score 65.6% on TCE-normal, and 7.5% on TCE-hard. This shows that state-of-the-art LLMs clearly understand the TCE task but perform poorly as system complexity increases. Our code is available at our \href{https://github.com/nik-hz/tempobench}{GitHub repository}.



Rank-Induced PL Mirror Descent: A Rank-Faithful Second-Order Algorithm for Sleeping Experts

Zhang, Tiantian

arXiv.org Artificial Intelligence

We introduce a new algorithm, \emph{Rank-Induced Plackett--Luce Mirror Descent (RIPLM)}, which leverages the structural equivalence between the \emph{rank benchmark} and the \emph{distributional benchmark} established in \citet{BergamOzcanHsu2022}. Unlike prior approaches that operate on expert identities, RIPLM updates directly in the \emph{rank-induced Plackett--Luce (PL)} parameterization. This ensures that the algorithm's played distributions remain within the class of rank-induced distributions at every round, preserving the equivalence with the rank benchmark. To our knowledge, RIPLM is the first algorithm that is both (i) \emph{rank-faithful} and (ii) \emph{variance-adaptive} in the sleeping experts setting.


Spatiotemporally Consistent Indoor Lighting Estimation with Diffusion Priors

Tong, Mutian, Wu, Rundi, Zheng, Changxi

arXiv.org Artificial Intelligence

Indoor lighting estimation from a single image or video remains a challenge due to its highly ill-posed nature, especially when the lighting condition of the scene varies spatially and temporally. We propose a method that estimates from an input video a continuous light field describing the spatiotemporally varying lighting of the scene. We leverage 2D diffusion priors for optimizing such light field represented as a MLP. To enable zero-shot generalization to in-the-wild scenes, we fine-tune a pre-trained image diffusion model to predict lighting at multiple locations by jointly inpainting multiple chrome balls as light probes. We evaluate our method on indoor lighting estimation from a single image or video and show superior performance over compared baselines. Most importantly, we highlight results on spatiotemporally consistent lighting estimation from in-the-wild videos, which is rarely demonstrated in previous works.


LLM-based Realistic Safety-Critical Driving Video Generation

Fu, Yongjie, Zha, Ruijian, Tian, Pei, Di, Xuan

arXiv.org Artificial Intelligence

Designing diverse and safety-critical driving scenarios is essential for evaluating autonomous driving systems. In this paper, we propose a novel framework that leverages Large Language Models (LLMs) for few-shot code generation to automatically synthesize driving scenarios within the CARLA simulator, which has flexibility in scenario scripting, efficient code-based control of traffic participants, and enforcement of realistic physical dynamics. Given a few example prompts and code samples, the LLM generates safety-critical scenario scripts that specify the behavior and placement of traffic participants, with a particular focus on collision events. To bridge the gap between simulation and real-world appearance, we integrate a video generation pipeline using Cosmos-Transfer1 with ControlNet, which converts rendered scenes into realistic driving videos. Our approach enables controllable scenario generation and facilitates the creation of rare but critical edge cases, such as pedestrian crossings under occlusion or sudden vehicle cut-ins. Experimental results demonstrate the effectiveness of our method in generating a wide range of realistic, diverse, and safety-critical scenarios, offering a promising tool for simulation-based testing of autonomous vehicles.


LogicLearner: A Tool for the Guided Practice of Propositional Logic Proofs

Inamdar, Amogh, Macar, Uzay, Vazirani, Michel, Tarnow, Michael, Mustapha, Zarina, Dittren, Natalia, Sadeh, Sam, Verma, Nakul, Salleb-Aouissi, Ansaf

arXiv.org Artificial Intelligence

The study of propositional logic -- fundamental to the theory of computing -- is a cornerstone of the undergraduate computer science curriculum. Learning to solve logical proofs requires repeated guided practice, but undergraduate students often lack access to on-demand tutoring in a judgment-free environment. In this work, we highlight the need for guided practice tools in undergraduate mathematics education and outline the desiderata of an effective practice tool. We accordingly develop LogicLearner, a web application for guided logic proof practice. LogicLearner consists of an interface to attempt logic proofs step-by-step and an automated proof solver to generate solutions on the fly, allowing users to request guidance as needed. We pilot LogicLearner as a practice tool in two semesters of an undergraduate discrete mathematics course and receive strongly positive feedback for usability and pedagogical value in student surveys. To the best of our knowledge, LogicLearner is the only learning tool that provides an end-to-end practice environment for logic proofs with immediate, judgment-free feedback.


Like babies and dancers, this robot learns from studying itself

Popular Science

Researchers from Columbia University have successfully developed an autonomous robot arm capable of learning new motions and adapting to damage simply by watching itself move. The robot observed a video of itself and then used that data to plan its next actions--a practice the researchers refer to as "kinematic self-awareness." This unique learning process is designed to mimic the way humans adjust certain movements by watching themselves in a mirror. Teaching robots to learn this way could reduce the need for extensive training in bespoke 3D simulations. It could also one day make future autonomous robots operating in the real world better equipped to adapt to damage and environmental changes without constant human intervention.


Columbia University moves to hybrid learning on main campus amid antisemitic protests

FOX News

Students at Columbia University have been instructed that classes have shifted to virtual or hybrid amid ongoing safety concerns stemming from anti-Israel protests. The new guidelines said all courses on the Morningside main campus have moved to hybrid learning "until the end of each school's Spring 2024 semester." "Safety is our highest priority as we strive to support our students' learning and all the required academic operations," the school's Provost Angela Olinto wrote in a statement released early Tuesday morning. "It's vital that teaching and learning continue during this time." The announcement comes amid continued antisemitic protests on the New York City campus and just a day after classes were made virtual on Monday.


Uncanny Valley! Watch as a creepy humanoid robot mimics a researcher's facial expressions in real time - with eerie precision

Daily Mail - Science & tech

If we want to live in a world where we interact with robots, they'll have to be able to read and respond to our facial expressions in lightning-fast time. Now, scientists have come a step closer to creating such an advanced machine. 'Emo', built by experts at Columbia University in New York, is the fastest humanoid in the world when it comes to mimicking a person's expressions. In fact, it can'predict' a person's smile by looking for subtle signs in their facial muscles and imitate them so that they're effectively smiling at the same time. Amazing video shows the bot copying a researcher's facial expressions in real time with eerie precision and remarkable speed, thanks to cameras in its eyes. Columbia engineers build Emo, a silicon-clad robotic face that makes eye contact and can anticipate and replicate a person's smile at effectively the same time British-made Ameca is described as the'world's most advanced humanoid robot' Emo is the creation of researchers at Columbia University's Creative Machines Lab in New York, who present their work in a new study in Scientific Reports.